home *** CD-ROM | disk | FTP | other *** search
- Name: FFTR2E.ASM
- Type: Assembler Macro
- Version: 1.0
- Last Change: 4-Feb-87
-
- Description: 1024 Point, Non-In-place, Complex FFT Macro
-
- This FFT macro performs a 1024 point complex FFT on external data
- using the Radix 2, Decimation in Time, Cooley-Tukey FFT algorithm [1][2].
- The Cooley-Tukey algorithm is perhaps the simplest and most widely
- used form of FFT. Here we demonstrate several programming techniques
- and memory mapping strategies which take advantage of the DSP56000/1
- architecture to perform very fast FFT's. It does not use straight
- line code to speed up FFT's but rather exploits the use of the
- dual internal data buses and memories and the ability of the external
- bus controller to perform one external (off-chip) access without
- adding any wait states.
-
- FFTR2E starts by performing two Radix 2 passes on external data to
- partition the 1024 complex points into four sets of 256 complex points
- which can be processed independently. We take advantage of the fact
- that half of the FFT coefficients (twiddle factors) are zero for the
- first two passes to reduce the number of operations by one half. The
- remaining coefficients are +1 and -1, allowing add and subtract
- operations to replace coefficient lookup table accesses. Because of
- these facts, the output of the first pass can be kept internal to the
- Data ALU and only the output of the second pass is written back to
- memory. At this point all data is still external. Note that the coding
- of the first two passes is written with only one external access per
- instruction. Assuming the FFTR2E macro is in internal program memory,
- this guarantees that external data can be accessed at the same speed as
- internal data.
-
- Each set of 256 complex points is then moved into the DSP56000/1 internal
- data memories and a Radix 2 FFT is performed on-chip. Thus the
- calculation is non-in-place since the internal data memories are used as
- a temporary workspace. The first pass of each 256 point FFT reads the
- external data input and writes the intermediate data into the internal
- memories. All intermediate passes work on internal data. However, the
- output data from the last pass of each 256 point FFT is written back to
- the external data memories. Thus the output of the 1024 point complex
- FFT overwrites the input data in external data memory.
-
- The implementation uses 24 bit fixed point data storage and 24 bit
- fixed point FFT coefficients (sine and cosine lookup tables). The
- Data ALU maintains 56 bit accumulator precision whenever possible.
- All data is complex, with the real part in X Data memory and the
- imaginary part in Y Data memory. The external data buffer requires
- 1024 X Data and 1024 Y Data memory locations. The algorithm is
- performed "non-in-place", since the internal DSP56000/1 Data RAM's
- are used as additional workspace storage. The input data is assumed
- to be in normal (time-sequential) order and the output is in bit-reversed
- order. By using the reverse-carry address modifier and a separate
- output data buffer, the output data may be easily unscrambled. Other
- methods also exist to unscramble the output data without a separate
- output data buffer. For maximum speed, the FFT macro performs a lookup
- table operation to get new FFT coefficients (called twiddle factors) for
- each group of butterflies. The FFTR2E macro uses "twiddle factor"
- lookup tables (-cosine in X memory and -sine in Y memory) stored in
- external data memory. A SINCOS macro is available to generate these
- tables. The -sine table requires 512 X Data locations and the -cosine
- table requires 512 Y Data locations.
-
- All register initialization is performed by this macro. However, the
- macro assumes that registers which should not be altered by the FFT
- have already been saved by the main program. This allows the user to
- fit the FFT macro into his application and thus control the context
- switching overhead. No data scaling is performed and no overflow
- detection is done. Modifications to this routine could allow it to
- be used with the scaling modes and thus allow dynamic scaling for
- each FFT pass.
-
- The FFTR2E macro requires 104 words of program memory. Using a
- 20.5 MHz DSP56000/1, the FFTR2E macro can perform a 1024 point
- complex FFT in 3.39 milliseconds. Additional algorithm details are
- included in the source file; however, more algorithm description
- would be required for complete understanding by typical users. The
- test program FFTR2ET demonstrates the calling procedure for this
- macro.
-
- References
- ----------
-
- [1] Cooley, J. W., and Tukey, J. W., "An algorithm for the machine
- calculation of complex Fourier series," Math. Comput., vol. 19,
- pp. 297-301, April 1965.
- [2] Bergland, G. D., "A guided tour of the fast Fourier transform,"
- IEEE Spectrum, vol. 6, pp. 41-52, July 1969, also in Digital
- Signal Processing, edited by L. R. Rabiner and C. M. Rader,
- IEEE Press, pp. 228-239, 1972.
-